Scalable, Balanced Model-based Clustering

نویسندگان

  • Shi Zhong
  • Joydeep Ghosh
چکیده

This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process—iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balanceconstrained approach is used for the sample assignment step. An efficient iterative bipartitioning heuristic is developed to reduce the computational complexity of this step and make the balanced sample assignment algorithm scalable to large datasets. We demonstrate the superiority of this approach to regular ML clustering on complex data such as arbitraryshape 2-D spatial data, high-dimensional text documents, and EEG time series.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contents I Part A 7 1 Clustering with Balancing Constraints 9

In many applications of clustering, solutions that are balanced, i.e, where the clusters obtained are of comparable sizes, are preferred. This chapter describes several approaches to obtaining balanced clustering results that also scale well to large data sets. First, we describe a general scalable framework for obtaining balanced clustering which first clusters only a small subset of the data ...

متن کامل

Testing Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering

The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...

متن کامل

An Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks

LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...

متن کامل

A New WordNet Enriched Content-Collaborative Recommender System

The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003